Bistable Gradient Networks II: Storage Capacity and Behaviour Near Saturation 



Patrick McGraw and Michael Menzinger 

University of Toronto 
(Dated: February 1, 2008) 

We examine numerically the storage capacity and the behaviour near saturation of an attractor 
neural network consisting of bistable elements with an adjustable coupling strength, the Bistable 
Gradient Network (BGN). For strong coupling, we find evidence of a first-order "memory blackout" 
phase transition as in the Hopfield network. For weak coupling, on the other hand, there is no 
evidence of such a transition and memorized patterns can be stable even at high levels of loading. 
The enhanced storage capacity comes, however, at the cost of imperfect retrieval of the patterns 
from corrupted versions. 
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I. INTRODUCTION 

In this paper we consider the behaviour at high memory loading of a Hopfield-like attractor network of N bistable 
elements, the bistable gradient network or BGN[jl|]|l^]. This is a sequel to an earlier paper pj which considered the 
BGN at low loading. We compare the BGN to the deterministic Hopfield network (HN) "examining the storage 
capacity and other key properties. 

To begin, we review the BGN model and establish some notation. The BGN is described by the coupled differential 
equations 

dx l _ dH_ 
dt dxi ' 

where Xi are N continuous-valued real variables (or components of an iV-dimensional state vector x) representing 
the outputs of the N nodes of the network, and H is the Hamiltonian 
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H = H + H lnt = ^[ ^- - -J- 1 - 7 ^ WijXiXj . (2) 
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The quantities u>ij are the elements of a symmetric matrix of connection strengths and 7 is a control parameter 
determining the overall strength of the coupling among nodes. As in the Hopfield modeigQ, the connections or 
synaptic weights are determined by the Hebb learning rulejio) 
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where the N-dimensional vectors ^ represent a set of p distinct memory patterns to be recognized by the network. 
We take these patterns to consist of binary elements ±1 only, and we assign them random values, thus introducing 
quenched disorder. The BGN's key difference from the HN and from most of its continuous-valued relatives Jll[] lies 
in the presence of the local quartic potential term H in the Hamiltonian, which renders each node bistable. 

The interaction term Hi nt furnishes an input to each node given by hi = 7X^=1 w u a; j so that the dynamical 
I equation for each node is given by 

ci ; -± = Xi - x \ + hl . (4) 

The input may also be referred to as a "magnetic field" by analogy with Ising spin systems. In the absence of input 
each node has two stable fixed points at ±1, but a nonzero field shifts the two fixed points. If the critical magnitude 

h c = ~ 0.38 is exceeded then the field is strong enough to overcome the potential barrier in the quartic potential 
and there is then only one fixed point. 

We define two sets of order parameters m M and by 
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m M are inner products or overlaps of the state vector with the memorized patterns, while & M , the "bit overlaps," 
encode information about sign agreements between the state vector and the stored patterns. For the purposes of this 
paper, we will for the most part be more interested in 6 M than in m M , so where there is little risk of confusion we will 
drop the word "bit" and simply refer to as an "overlap." 

The degree of loading of the network's memory can be parametrized by the ratio a = p/N . In the companion 
paper Q we examined the behaviour of the BGN in the low- loading limit a <C 1. It was shown that the network 
can function as an associative memory and correct sign flip errors in a stored pattern as long as 7 > i . We found 
that the attractors of the BGN's dynamics are readily classified into three categories which are separated from each 
other in energy. The lowest energy states are the memory or retrieval states, each of which corresponds to one of the 
memorized patterns. In addition to these there are higher-energy spurious attractors of two types. The mixture or 
spin glass states have partial overlaps with several patterns and thus lie close to the subspace spanned by the memory 
patterns. Uncondensed states, which have no counterpart in the HN, are states in which none of the fields acting 
on the nodes are strong enough to cause sign flips and the dynamics is therefore dominated by the local potential. 
They have energies per node close to —0.25. The spin glass states are intermediate in energy between the memory 
states and the uncondensed states. In the range | < 7 ^ 1, pattern recognition by the network was found to be 
highly selective; the input must be close to the stored pattern in order for the pattern to be fully restored. The 
uncondensed states are numerous (of order 2^) and fill most of the configuration space. The memorized patterns 
and their basins of attraction occupy isolated valleys among these states, but these valleys expand as 7 increases. 
For 7 ^ 1, on the other hand, the behaviour resembles that of the HN: there are no uncondensed states, and the 
memory states have large basins of attraction. 

In this paper we turn to the case where p/N is of order unity. We are interested in the maximum storage capacity, 
or the maximum number of patterns that can be successfully stored and retrieved, as well as changes in the network's 
performance as this limit is approached. Earlier work with small networks |l2] | suggested that at least under certain 
conditions the BGN could store many more patterns than the HN while possessing few spurious attractors. It is 
known that the Hopfield storage limit of p w 0.14iV memory patterns can be exceeded if a more complicated learning 
algorithm is used M, but in the BGN case improved capacity is achieved with the familiar Hebb rule. Since the 
previous results flf ]l2[l were obtained with networks much too small to be of practical interest (e.g., N — 5), we now 
examine larger networks, mainly through numerical simulations. (At the end of the paper we will return briefly to 
the small-network case.) We find that the high-loading behaviour, like that at low loading, depends strongly on 7. 
For 7 > 1 , a Hopfield- like first-order phase transition results in the destabilization of all memory states at a critical 
value of a. For 7 = 2 this transition occurs at a c « 0.1, compared to a c 0.14 for the HN. For 7 = 0.5, on the 
other hand, we find that it is possible for the stored patterns to remain stable even at loading factors of aw 0.3 and 
higher. Furthermore, there is no sudden blackout; instead, the performance degrades gradually as a increases. The 
price of this high capacity is that the retrieval of the patterns from corrupted versions may be imperfect. 

The remainder of the paper is organized as follows: In section || we discuss in general terms the effects of crosstalk, 
or interference between different stored patterns. It is crosstalk which is responsible for limitations on storage 
capacity. We compare the effects of crosstalk in the BGN and the HN. This discussion provides a framework for 
interpreting our numerical results. In section III we examine numerically the stability of memory patterns as a 
function of a. We find evidence of a first-order memory blackout phase transition in the BGN at high 7, but not at 



low 7. In section IV, we examine the effects of high loading on the basins of attraction for the memory states and 
on their retrieval from corrupted input. We see that increasing the loading markedly alters the energy landscape. 
In section |y| we comment briefly on the behaviour of smaller networks and on the relation between the previous small 
network results p{|p2| and those we have obtained for larger networks. We conclude with some discussion of the 
BGN results and possible directions for further study. We make some conjectures concerning the performance of the 
BGN in the presence of stochastic noise. 

II. PATTERN RETRIEVAL AND CROSSTALK 

In a given state x, the input to the i-th node of the network can be expressed as 

N p 

hi = jYl w ljXj = 7 ^i m i^ - 1jJ Xi > (7) 
3=1 m=i 
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using the Hebb rule (||) and the definition (|5|) of m M . In 0], we showed that when a <C 1 there are stable retrieval 
states in which fe„ = 1 for one particular v. For a state in which the v-Xh overlap is large, we can decompose the field 
as follows: 



^ = 7 i\m v + £ - jjXi \ (8) 
= 7 (s^; + Q - J^rX, 

We refer to the first term, Si, as the "signal" term and the second term, Ci, as the "crosstalk" term. In the limit 
■£> — > 0, the mutual overlaps between different patterns is small, so that m v <C 1 (y ^ fx), and the last two terms in 
(||) can be neglected. The signal term is then dominant and there is a stable retrieval state given by x = + 7£" 
with m v = y^l + 7. We expect this solution to be approximately valid for small but nonzero values of -fe. For this 
case, the overlaps m M (fx 7^ v) behave as Gaussian distributed random variables with zero mean and variance \j\f~N . 
Accordingly, the crosstalk term Ci = ^2 £i m> * m being a sum of p such quantities, is a random quantity with 



zero mean and variance y/p/N. The third term, which arises from the subtraction of the diagonal elements, is of 
order p/N and thus is generally smaller than the crosstalk term. We will neglect it for the moment. 

In the absence of crosstalk, a retrieval state is not only linearly stable (i.e., stable against small perturbations) 
but, for 7 > ^ , it is also stable against individual sign flips. The latter means that if a retrieval state is corrupted 
by changing the sign of one or a small number <C N of nodes, then the dynamics will reverse the flipped sign and 
restore the pattern. This happens because, in the absence of crosstalk, each node experiences a field jSi = "fS.i'm^ 
which has the same sign as and for 7 > 3 that field is strong enough to overcome the potential barrier of the 
individual node. Now consider a given node (say, the i-th. node) in the presence of crosstalk. The crosstalk field 
acting on the i-th node may be either aligned with or opposed to Xi. If it is aligned, then its effect on that node is to 
increase the magnitude of a;,-, making it larger than y/1 + 7. If it is opposed to Xi, then its effect is to decrease the 
equilibrium magnitude of x\ . If the crosstalk term is large enough, then it may be sufficient to overcome the local 
potential barrier and reverse the sign of Xi, thus introducing a sign error into the pattern. This will occur only if 

97 

By contrast, in the Hopfield model a sign error is introduced if 

Q + Si < 0. 

Thus the relative strength of crosstalk required to introduce sign errors is greater for the BGN than for the HN, and 
the discrepancy is greatest for small values of 7. One might expect that this would make the BGN less vulnerable to 
crosstalk (and the memory states more stable) than the HN, especially at low 7, but this is not a foregone conclusion 
as the BGN's dynamics include mechanisms which tend to amplify small initial overlaps Q and could conceivably also 
amplify crosstalk. Our numerical results confirm that the BGN is in fact less prone to crosstalk-induced errors at 
low values of 7, but not at high values. 
If 

97 <G * + ^< 97 

the crosstalk will not be strong enough to reverse the sign of Xi if initially Xi is correctly aligned with but it 
will nonetheless destroy the stability against a sign flip. In other words, if xi is initially misaligned, it will not be 
corrected. We may say that the node is "bistabilized" but not destabilized. (Such an effect cannot occur in the 
HN where the nodes are not individually bistable.) Since the crosstalk is random, it will in general bistabilize some 
nodes and not others, with the result that the memory state will be stable against sign flips of certain nodes but 
not of others. Thus, even though a memory state may be linearly stable, the dynamics may only be able to correct 
some sign errors, not all. This contrasts with the low-loading case where crosstalk is negligible and there is a single 
threshold coupling strength, 7 = i, above which any single sign error can be corrected. In general, crosstalk results 
in non-uniform behaviour among nodes, including different magnitudes of Xi for different i. 
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FIG. 1: A series of remanent overlap histograms for a HN with N=2000 nodes, at different values of the loading fraction 
a = p/N. When the critical loading a c = 0.148 is exceeded, the strong peak at b r > 0.95 decays and another peak appears 
at. b r ?s 0.3 



III. STABILITY OF THE MEMORY STATES AND REMANENT OVERLAP 

To examine the stability of the memory states, we followed a procedure similar to that of reference j^j. Using an 
ensemble of realizations of the random patterns, we made a number of trials in which the network was initialized to 
the state x = for some pattern v. The initial bit overlap b v was thus equal to 1. We then allowed the state to 
evolve until an attractor was reached. In each case we measured the final energy as well as the bit overlap between the 
initial and final states, to which we refer as the remanent bit overlap b r . We then constructed histograms showing 
the probabilities P(b r ) for b r falling in intervals of width 0.05. Each of the histograms discussed in this section 
represents an ensemble of at least 200 initial conditions. If the memory state is entirely stable, as is the case at 
very low loading, then after the dynamics converges the final bit overlap will still be equal to 1. For higher loading, 
however, the crosstalk fields may introduce sign errors so that b r < 1. 



A. HN and BGN with high 7 

First consider the case of the HN. Figure [l] shows a series of P(b r ) histograms for the HN at different values of 
a = p/N. (The data here are our own, but the results are comparable to those given in |)|. We include them for 
comparison with BGN results.) At lower levels of loading, crosstalk introduces few errors, so b r > 0.95 in the large 
majority of cases. However, as a increases beyond the critical value a c ~ 0.148, the high-6 r peak of the distribution 
begins to vanish and a second, broader peak begins to grow in the vicinity of b r w 0.3. The states in the second 
peak are spin glass states. As shown in figure |2|, this transition becomes sharper with increasing network size, and 
finite-size scaling analysis shows behaviour characteristic of a first order phase transition in the thermodynamic limit 
||. In the limit N — > 00 the associative memory fails suddenly as the critical loading is exceeded — the remanent 
overlap drops abruptly from near 1 to 0.3. This nonzero value of the remanent overlap above the critical loading 
was noted in |^] and attributed to replica symmetry breaking, as the replica symmetric theory predicts that b r should 
drop to zero above the phase transition. This phenomenon is related to the non-zero remanent magnetization of a 
spin glass jHJ. 

In the BGN with 7 = 2, a similar transition evidently occurs at a c « 0.1. As evidence, in figure we show two 
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FIG. 2: Remanent overlap histograms for the Hopfield network below the critical loading (left column) and above (right 
column), showing finite size effects. The transition becomes sharper as TV increases. 
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FIG. 3: Remanent overlap histograms for the BGN with 7 = 2 show evidence of a first-order transition at 0.9 < a c < 0.11. 
Note the similarity with figure ^. 
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FIG. 4: Scatter plots of final energies and remanent overlaps for trajectories starting from memory patterns, for BGN above 
critical loading. (A) BGN with N = 2000, 7 = 2 and a — p/N = 0.11. Note that the overlap is strongly correlated with 
the energy and there is a gap in energy between the high and low-6 r states. This gap represents the latent heat of the phase 
transition. (B) BGN with N = 2000, 7 = 1 and a = 0.2. As in the 7 = 2 case, the energy is strongly correlated with b r , but 
the high and low groups overlap and the energy difference is smaller. 



series of histograms at increasing N, one below the suspected transition and one above. As in the HN case, the 
transition grows sharper with increasing network size. Below the critical loading, the high-& peak remains robust as 
TV increases, while above the critical loading the high-fo peak shrinks with increasing network size and the low-6 peak 
grows. Two quantitative differences are that the critical loading a c is lower in the BGN case, a c rs 0.1, while the 
average remanent overlap above the critical loading is higher, near 0.45. 

The first-order nature of the transition is confirmed by examining the energies of the final states. These energies 
and the overlaps are shown in a scatter plot in figure ^A. The spin glass states associated with the low-6 r peak are 
clustered at energies below those of the retrieval states. The gap in energy between these two clusters corresponds 
to the latent heat of the phase transition. 



B. BGN at low 7 

For 7 = 0.5, in contrast to 7 = 2, the BGN's behaviour differs markedly from that of the HN. A series of P(b r ) 
histograms for different values of p/N is shown in figure [s]. Two features are evident. First, the stored patterns remain 
stable with few errors even up to high levels of storage, a = 0.3. Second, there is no evidence of a discontinuous 
memory failure; rather, the retrieval quality as measured by b r appears to degrade continuously as a increases. No 
second peak appears in the histograms; instead, the high-& peak first spreads and then begins to drift downward as 
errors accumulate. 

At an intermediate value 7 = 1 the P(b r ) histograms (figure ^) suggest a first-order transition, although the 
evidence is less pronounced than in the 7 = 2 case. A second peak appears above the transition, and the high-fe r 
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FIG. 5: Remanent bit overlap histograms for the BGN with N = 1000 and 7 = 0.5, at a series of increasing values of the 
loading factor a. Few errors occur even at a = 0.3, and the memory degrades gradually rather than abruptly with increasing 
a. 



peak shows clear signs of shrinking as N increases, but the tails of the two peaks overlap substantially . The greater 
overlap between the peaks comes about for two reasons. First, the high-6 r peak above the critical loading is broader 
than in the 7 = 2 case. Second, the remanent magnetisation is much higher (i.e., the drop in b r at the critical point 
is much smaller.) The latent heat is also much smaller, as can be seen from the scatter plot of the energy (figure ^B). 
The critical loading, or storage capacity, is approximately 0.17, higher than for 7 = 2 and higher than for the HN. 



IV. ATTRACTORS, BASINS AND THE ENERGY LANDSCAPE AT HIGH LOADING 

In the previous section, we examined the trajectories of initial conditions corresponding to memorized patterns and 
determined whether these trajectories remain close to the pattern or move away from it. Such experiments, however, 
probe only one aspect of network performance. We are interested not only in the stability of memory states but also 
in the sizes of their basins of attraction. The function of associative memory depends on the ability of the dynamics 
to correct partially corrupted patterns. More generally, we are interested in the evolution of the energy landscape 
with increasing a. To address these issues, we performed two additional sets of numerical experiments. First, we 
examined the attractors reached from a large number of random initial conditions to obtain a uniform sampling of 
the phase space and a broad picture of the energy landscape. Second, we examined the fate of initial conditions 
at specified Hamming distances from memory patterns. The latter set of experiments probes the landscape in the 
vicinity of the memory states. Results of similar experiments were given in ref. for the case of low loading. 



A. Evolution of the energy landscape: attractors reached from random initial states 

In the case of the HN, it is known that in the thermodynamic limit the memory states are the lowest energy states 
for a < 0.05, while for higher values of a spin glass states arise which have lower energies. Up to a c » 0.148 the 
memory states remain local minima of the energy even though they are not the global minima. Above a c they cease 
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FIG. 6: Remanent overlap histograms for BGN with 7 = 1 are consistent with a first-order transition at 0.16 < a c < 0.18. 



even to be local minima and therefore become unstable The drop in energy from the memory states to the spin 
glass states at a c is the latent heat. One way to observe the evolution of the energy landscape is to examine the 
attractors reached from an ensemble of random initial conditions which effectively samples the configuration space. 
Figures show histograms for the energies of attractors sampled in this manner. In each case, we sampled a total 
of at least 200 random initial conditions with several realizations of the random patterns In figure 0, for the 

HN, we can see that at low loading the attractors are separated into two groups, the retrieval states at E/N = —0.5 
and spurious states at a range of higher energies. The probability of retrieving a memory state from a random 
initial condition is high. With increasing a, the spurious states move to lower in energies until they are below the 
memory states. At the same time, their basins of attraction take up a larger portion of the configuration space, as 
is apparent from the growing size of the spin glass peak in the histogram and the shrinking size of the retrieval state 
peak. Figure ^ shows that for the BGN with 7 = 2 the evolution is qualitatively similar. In figure ^, for 7 = 0.75, 
we observe that an additional effect of high loading is to destabilize the uncondensed states. At low loading, the 
uncondensed states dominate the configuration space — almost all random initial conditions land on an uncondensed 
state, as was noted in ^ . The other two types of state begin to show their presence as the loading is increased, while 
the uncondensed states eventually disappear. 



B. Retrieval of patterns from corrupted versions 

To obtain information about the landscape in the vicinity of the memory states and about the shapes of their basins 
of attraction, we examined the fates of initial conditions which were not random but rather at specified initial overlaps 



1 For a schematic illustration of the evolution of the energy landscape, see fig. 2.18 of reference 
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FIG. 7: Energy histograms of P(E/N) for attractors reached from random initial states of the HN with N = 1000 nodes. As 
the number p of stored patterns increases, the peak at E/N — —0.5 corresponding to the retrieval states shrinks (and also 
spreads slightly) while the the spin glass states drift downward in energy until they are lower than the retrieval state energies. 



with particular stored patterns. As in these initial configurations were generated by starting with a particular 
"target" pattern and flipping the signs of a specified number of randomly chosen nodes. For each value of the initial 
overlap binit, we generated an ensemble of initial conditions for several different realizations of the random set of 
memory patterns. We then evolved these states until the dynamics converged, and evaluated bfi na i, the final overlap 
with the target pattern, for each trajectory. If bfi na i = 1, this signifies that all signs which were initially flipped 
have been corrected and the target pattern has been retrieved perfectly. If binit < bfi na i < 1, then the pattern has 
been retrieved imperfectly. The final state is closer to the stored pattern, but not all errors have been corrected. If 
binit > bfinai, then the trajectory has moved farther away from the stored pattern. As discussed in section 55. the 
ability of the network to correct sign errors depends on the competition among the signal, the crosstalk, and the local 
potential. The initial states currently discussed have at least a moderately large overlap with the target pattern, 
resulting in a signal, and random overlaps with the other memory patterns, resulting in crosstalk. 

Consider now the HN. At low loading, there is very little crosstalk. The crosstalk term is typically of order yJp/N. 
If the HN is set in an initial condition whose overlap with the target pattern £ is binit > y/p/N, then most nodes 
experience a net field which is aligned with the target pattern. Nodes which are initially misaligned with the target 
state {xi = — £j) will tend to change their signs and align with Each node which realigns in this way increases 
the value of b and thus increases the strength of the signal acting on the remaining nodes. As a result, even if in 
the initial state some fraction of the nodes experience a net field opposed to eventually the growing signal may 
overcome the c rosstalk and correct those nodes as well. Therefore for low loading, as long as the initial state has 
binit > y/p/N, the probably of completely retrieving the target pattern is close to unity. As the loading ratio a 
increases, however, the typical crosstalk becomes stronger, and a higher signal is required to overcome the crosstalk 
noise. Therefore sign errors are not likely to be corrected unless the initial overlap is above a threshold, which grows 
higher with increasing a. If the crosstalk is too large, then some signs which are initially aligned with the pattern 
may be flipped, and the state may move away from the target pattern instead of toward it. Each node which flips 
out of alignment with the target pattern reduces the size of the overlap and hence of the signal, which makes other 
nodes more susceptible to crosstalk-induced errors, and a cascade of errors can occur. The critical loading a c is the 
level at which even a state with binit = 1 becomes unstable against such a cascade of errors. 

Figure |l0|A shows a scatter plot of bfi na t vs. binit for a HN slightly below critical loading. There is a threshold 
overlap for retrieval. If binit > 0.5 the signal is strong enough to correct errors and the majority of trajectories flow 
towards the target pattern. For bi n u < 0.5 the majority of trajectories move instead away from the target pattern. 
As a increases, the threshold value of bi n u for retrieval increases until at a c it reaches 1. The plot in figure 10A is 
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FIG. 8: Attractor energies from random initial conditions for BGN with 7 = 2. Qualitative behaviour resembles that of the 
HN as shown in figure [7L One difference is that the spreading of the retrieval state energies with increasing loading is more 
pronounced. 
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FIG. 9: Attractor energies from random initial conditions for BGN with N = 1000 and 7 = 0.75. For low loading, the 
uncondensed states (with energy E/N ~ —0.25) fill most of the configuration space. The other two peaks in the histogram 
are very small. As the loading increases, however, the uncondensed states disappear. As in all cases, the "spin glass" peak 
grows larger with increasing a and shifts downward in energy. 
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FIG. 10: Scatter plots of bji„ a i vs. bi n u for networks slightly below critical loading. Points show bfi na i for an ensemble of 
initial conditions with specified binit- The average of bf ina i is shown by the solid curve. The dotted diagonal line bfi na i = binit 
is drawn for comparison: points above the line have bfi na i > binit. (A) HN with N = 2000, p = 260. (B) BGN with 
N = 2000, 7 = 2 and p — 170. In both cases, the retrieval quality as measured by bfinai drops sharply for binit < 0.6. For 
the BGN, however, the average b final is always greater than bi n u. 



for a network with N = 2000; experiments with networks of different sizes reveal that the retrieval threshold becomes 
sharper as N increases. 

In the BGN, on the other hand, the dynamics of error correction is more complicated due to the local potential. At 
strong coupling 7 > 1 , the potential barriers against sign flips are less important than at weak coupling. As a result 
the BGN in this regime behaves in many respects like the HN. It is not surprising, then, that the scatter plot of bfi na i 
vs. bi n u for a BGN with 7 = 2 slightly below its critical loading (figure |To|B) appears qualitatively similar to the 
corresponding figure [To|A for the HN. There is a threshold (approximately bma — 0.6) below which the probability 
of fully retrieving the target pattern drops sharply. Above this threshold, the signal is evidently strong enough to 
correct most sign errors. A key difference, however, is that even below this threshold the average bfi na i is larger than 
binit- This means that the majority of a trajectories move toward the target pattern rather than away from it, but 
only some of the sign errors are corrected, not all. 

In the case of the BGN with 7 = 0.5, the local potential barriers are important, and interesting dynamics results 
from the competition among the signal from the target pattern, the crosstalk from other patterns, and the local 
potential. In the a <C 1 case Q , sign errors can be corrected only if the signal is strong enough to overcome the 
local potential barriers, and therefore there is a threshold value of the initial overlap which is much larger than 
yfp/N. For the case N = 1000, p = 5, for example, we find that this threshold is approximately bi n u — 0.5. For 
binit > 0.5 there is a large probability that the target pattern is retrieved perfectly, while for bi n n < 0.5 there is a large 
probability that the network will be stuck in an uncondensed state with bf ina i = b in i t . In the intermediate range 
0.4 < bi n it < 0-6, there is also a significant probability that the trajectory is attracted to an asymmetric spurious 
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state in which bfi na i is large but not unity and there are larger than random overlaps with one or more other memory 
patterns. This behaviour is illustrated by the scatter plot of figure [Tl|A. As the loading increases (Figure |Tl|A-D), 
something surprising happens: at first, the basins of attraction of the memory states expand slightly, contrary to 
what one would expect from the HN. The frontier of the uncondensed states is pushed back to lower values of bi n u- 
For p — 50, or a = 0.05 (fig. |rT]D) almost all states with binit > 0.1 undergo some motion toward the target pattern, 
even if the pattern is not retrieved perfectly. As p increases further, the b final vs. binit plot preserves the approximate 
shape of fig. |ll|D. 

The dynamics of retrieval and error correction for the BGN with weak coupling and high loading is evidently quite 
different from that of the HN. In the a = 0.05 case of figure pl]D, bfi na i is always 1 if bi„t = 1, which means that 
the memory state is stable. However, it is retrieved only imperfectly when sign errors are introduced: if b^it < 1 
then bi n n < bfi na i < 1- This indicates that some nodes remain bistable and cannot be corrected. If the initial state 
is close to the target pattern, then the majority of errors are corrected, but that fraction decreases with decreasing 
binit- Such a partial correction of errors does not often occur in the HN case. In the latter case, an initial condition 
either flows all the way to the retrieval state or moves away from it toward another attractor. The retrieval state 
may itself have a small number of errors due to crosstalk, but the presence of these errors does not depend on the 
initial state. The energy landscape of the HN in the neighborhood of a memory state apparently has the shape of 
a smooth basin — once the basin is entered, the trajectory usually runs without obstruction to the attractor at the 
bottom. For the weakly coupled BGN, on the other hand, the landscape appears to have the structure of a "funnel," 
i- e -' a sec l uence °f local minima at decreasing energies, with low potential barriers separating each state 
from the next. There is a region of configuration space which has an overall tilt toward the retrieval state, but 
in which there are many local minima which may trap the trajectory before it reaches the retrieval state. This is 



illustrated schematically in figure 12. Funnel-shaped energy landscapes were first suggested in the context of protein 
folding dynamics. 



V. A COMMENT ON SMALL NETWORKS 



So far, this paper and the companion paper |2| have focused mainly on large networks of N = 1000 or more. 
However, some applications of neural network algorithms to robotics and other areas make use of networks of only 20 



or 100 nodes. Experimental studies of BGN-like chemical reactor networks 14 [19] used fewer than 10 nodes. The 
current work on the BGN was inspired in part by results suggesting that the BGN could store many more patterns 
than the HN, with fewer spurious attractors [ [l2"| . The latter results were inferred from a few selected cases using 
very small networks, and so we attempted to test the genericness of these results for a variety of small networks as 
well as for larger networks. 

It should be noted that with very small networks, there are great fluctuations in properties depending on the 
particular set of memory patterns chosen, as it is impossible to ignore the mutual correlations among patterns. 
Results for the maximum storage capacity of a network are well-defined only in the thermodynamic limit N — > oo, 
and for small N even an HN may in particular cases be able to store more than 0.1AN stable patterns. For these 
reasons one cannot draw strong general conclusions about storage capacity based on small networks alone, but it is 
nonetheless instructive to make some comparative studies of pattern stability in small networks. For several small 
values of N and p, we generated random sets of stored patterns and tested their stability, using the HN and the BGN 
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at several different values of 7. We counted the average fraction of memory patterns which were stable, f stable- 
In addition, we estimated the fraction of configuration space V spur occupied by spurious attractors by following the 
trajectories of random initial conditions inside the hypercube < 2. These results are collected in table |. In 
all cases the results were averaged over at least 100 sets of randomly generated patterns. In generating random 
sets of patterns, we did not eliminate cases in which two or more patterns are identical. The results show that, 
contrary to the selected examples discussed in jlj and Jl2| , the spurious attractors do not generically occupy much 
less configuration space in the BGN than in the HN (in fact they normally occupy more, especially at low values of 
7). However, for the most part the percentage of memory patterns which are stable is larger in the BGN than in the 
HN as long as 7 < 1. As with larger networks, the BGN becomes most similar to the HN when 7 > 1, while lower 
7 leads to increased pattern stability. The increased pattern stability at low 7 is associated, however, with smaller 
basins of attraction for the memory states and therefore with a greater volume of phase space occupied by spurious 



2 In the thermodynamic limit there is an approximate permutation symmetry among the memory patterns, so that in general either all 
will be stable or none will be. By contrast, in small networks it is not unusual for some memorized patterns to reamain stable while 
others are not. 
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states. 



VI. DISCUSSION 



We have studied the properties of the BGN at high memory loading a. Our results can be summarized as follows: 
For high values of 7, such as 2, there is a first-order transition similar to that of a HN. For 7 = 2, the transition 
occurs at a critical loading a c w 0.1, which is lower than the critical loading of a ~ 0.148 for the HN. As 7 decreases, 
the critical loading increases and the phase transition evidently becomes weaker and eventually disappears. For 
7=1 the critical loading is apparently a c as 0.17 (higher than for the HN) and the phase transition is much less 
pronounced for the finite-size networks we have studied. For 7 = 0.5 there is no evidence of a phase transition at all 
and patterns are stable with very few errors up to a sa 0.3. 

A phenomenon that occurs in the BGN much more than in the HN is the partial retrieval of a pattern, whereby 
the dynamics corrects some sign errors in a pattern without correcting all of them. This is especially noticeable in 
the case of low 7 and high a. This phenomenon suggests that in this case the energy landscape in the vicinity of a 
stored pattern has the shape of a funnel rather than a smooth basin of attraction. By a smooth basin of attraction, 
we mean a connected neighborhood which is sloped toward an attractor and in which there are few local minima that 
might obstruct a trajectory once it has begun to flow toward the attractor. By a funnel we mean a region with an 
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FIG. 12: Schematic picture of a funnel, or bumpy basin. Most trajectories travel some distance toward the global minimum 
(representing for example a memory state) but become trapped in a local minimum (a state with some errors) before reaching 
the bottom. 



overall average slope toward an attractor which however contains many other local minima in which the trajectory 
might become stuck before reaching the bottom. The presence of these local minima is due to the bistability of the 
individual nodes and the local potential barriers against spin flips. These same potential barriers are also responsible 
for stabilizing the patterns. They make it less likely that crosstalk noise will induce an error in a pattern, but they 
also can inhibit the correction of an error which is present initially. Funnel-shaped landscapes were first examined 
in the context of protein folding dynamics [[l6]][|l7j. It was suggested that at a finite temperature such a landscape 
would allow the protein dynamics efficiently to find the global minimum of energy in spite of the presence of numerous 
local minima separated by potential barriers. If indeed the landscape of the BGN at low 7 forms a funnel, then it 
is possible that the introduction of some stochastic noise (i.e., finite temperature) could improve pattern retrieval 
by allowing trajectories to jump over the comparatively small potential barriers into lower energy minima, just as in 
the protein case. The effect of stochastic noise could be a fruitful subject for further study. An interesting question 
is whether there is an optimum level of noise which would improve the pattern retrieval ability of the BGN with 
low 7 while at the same time maintaining the larger storage capacity. Intriguing questions remain concerning the 
dynamics of the BGN at low 7. The apparent initial expansion of the basins of attraction of the memory states with 
increasing loading is counterintuitive, and the patterns visible in figure 
fully understood. 
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hint at some dynamics which is not yet 
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